Overview

Dataset statistics

Number of variables28
Number of observations15211
Missing cells0
Missing cells (%)0.0%
Duplicate rows2
Duplicate rows (%)< 0.1%
Total size in memory3.2 MiB
Average record size in memory224.0 B

Variable types

CAT16
NUM12

Reproduction

Analysis started2020-06-05 02:43:16.729964
Analysis finished2020-06-05 02:43:41.782960
Duration25.05 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 2 (< 0.1%) duplicate rows Duplicates
injury_city has a high cardinality: 1330 distinct values High cardinality
injury_postal has a high cardinality: 1824 distinct values High cardinality
injury_state_code has a high cardinality: 53 distinct values High cardinality
severity_index_code is highly skewed (γ1 = -20.44536795) Skewed
Dependent has 2689 (17.7%) zeros Zeros
diff_carrier_employer has 3172 (20.9%) zeros Zeros
diff_employer_injury has 11197 (73.6%) zeros Zeros

Variables

Dependent
Real number (ℝ≥0)

ZEROS

Distinct count4828
Unique (%)31.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6086.151140621919
Minimum0
Maximum172488
Zeros2689
Zeros (%)17.7%
Memory size118.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1150
median434
Q31597.5
95-th percentile37133.5
Maximum172488
Range172488
Interquartile range (IQR)1447.5

Descriptive statistics

Standard deviation19617.156
Coefficient of variation (CV)3.223244961
Kurtosis27.01123325
Mean6086.151141
Median Absolute Deviation (MAD)434
Skewness4.921950644
Sum92576445
Variance384832809.3
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0268917.7%
 
154420.3%
 
150320.2%
 
222280.2%
 
180270.2%
 
215270.2%
 
232240.2%
 
3240.2%
 
134230.2%
 
193230.2%
 
Other values (4818)1227280.7%
 
ValueCountFrequency (%) 
0268917.7%
 
12< 0.1%
 
22< 0.1%
 
3240.2%
 
490.1%
 
ValueCountFrequency (%) 
1724881< 0.1%
 
1719901< 0.1%
 
1716361< 0.1%
 
1716191< 0.1%
 
1710651< 0.1%
 

ave_wkly_wage
Real number (ℝ≥0)

Distinct count1911
Unique (%)12.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1051.9241995924003
Minimum2.0
Maximum9999.0
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum2
5-th percentile320
Q11000
median1000
Q31000
95-th percentile1928
Maximum9999
Range9997
Interquartile range (IQR)0

Descriptive statistics

Standard deviation569.5368174
Coefficient of variation (CV)0.541423819
Kurtosis42.79208353
Mean1051.9242
Median Absolute Deviation (MAD)0
Skewness4.657915574
Sum16000819
Variance324372.1864
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1000971463.9%
 
5003822.5%
 
3202041.3%
 
6001481.0%
 
1501260.8%
 
1001210.8%
 
400840.6%
 
1500700.5%
 
1200660.4%
 
300610.4%
 
Other values (1901)423527.8%
 
ValueCountFrequency (%) 
23< 0.1%
 
32< 0.1%
 
52< 0.1%
 
72< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
99991< 0.1%
 
93161< 0.1%
 
92001< 0.1%
 
90001< 0.1%
 
89001< 0.1%
 

body_part_code
Real number (ℝ≥0)

Distinct count46
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.0023667083032
Minimum10
Maximum90
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum10
5-th percentile14
Q133
median38
Q353
95-th percentile65
Maximum90
Range80
Interquartile range (IQR)20

Descriptive statistics

Standard deviation17.22066995
Coefficient of variation (CV)0.4199920965
Kurtosis0.53521489
Mean41.00236671
Median Absolute Deviation (MAD)15
Skewness0.4629484832
Sum623687
Variance296.5514737
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3614209.3%
 
4212738.4%
 
5312017.9%
 
1811107.3%
 
1410276.8%
 
558985.9%
 
358165.4%
 
387074.6%
 
566114.0%
 
545663.7%
 
Other values (36)558236.7%
 
ValueCountFrequency (%) 
10250.2%
 
11180.1%
 
12350.2%
 
131280.8%
 
1410276.8%
 
ValueCountFrequency (%) 
905513.6%
 
66780.5%
 
653062.0%
 
63960.6%
 
62250.2%
 

cause_code
Real number (ℝ≥0)

Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1497.672736835185
Minimum1000
Maximum1900
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum1000
5-th percentile1100
Q11300
median1500
Q31700
95-th percentile1900
Maximum1900
Range900
Interquartile range (IQR)400

Descriptive statistics

Standard deviation236.2897539
Coefficient of variation (CV)0.1577712861
Kurtosis-0.8417619407
Mean1497.672737
Median Absolute Deviation (MAD)200
Skewness0.1100268958
Sum22781100
Variance55832.84779
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1500409026.9%
 
1300260017.1%
 
1700251216.5%
 
1900199913.1%
 
1200187312.3%
 
160010016.6%
 
11005523.6%
 
14003532.3%
 
10002111.4%
 
1800200.1%
 
ValueCountFrequency (%) 
10002111.4%
 
11005523.6%
 
1200187312.3%
 
1300260017.1%
 
14003532.3%
 
ValueCountFrequency (%) 
1900199913.1%
 
1800200.1%
 
1700251216.5%
 
160010016.6%
 
1500409026.9%
 

claimant_age
Real number (ℝ≥0)

Distinct count84
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.17835776740517
Minimum1.0
Maximum91.0
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum1
5-th percentile23
Q132
median40
Q347
95-th percentile59
Maximum91
Range90
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.94983816
Coefficient of variation (CV)0.2725307545
Kurtosis-0.1070181697
Mean40.17835777
Median Absolute Deviation (MAD)8
Skewness0.2088697528
Sum611153
Variance119.8989556
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
40251116.5%
 
414272.8%
 
433752.5%
 
363732.5%
 
473712.4%
 
463712.4%
 
453622.4%
 
343622.4%
 
393612.4%
 
243592.4%
 
Other values (74)933961.4%
 
ValueCountFrequency (%) 
16< 0.1%
 
22< 0.1%
 
32< 0.1%
 
42< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
911< 0.1%
 
891< 0.1%
 
851< 0.1%
 
841< 0.1%
 
832< 0.1%
 

gender_code
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
M
12201
F
2928
U
 
82
ValueCountFrequency (%) 
M1220180.2%
 
F292819.2%
 
U820.5%
 

Length

Max length1
Median length1
Mean length1
Min length1
Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
unk
12623
U
 
1378
M
 
1190
S
 
20
ValueCountFrequency (%) 
unk1262383.0%
 
U13789.1%
 
M11907.8%
 
S200.1%
 

Length

Max length3
Median length3
Mean length2.65971994
Min length1

claim_st_code
Categorical

Distinct count49
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
CA
8798
NY
 
1112
NM
 
603
GA
 
528
TX
 
513
Other values (44)
3657
ValueCountFrequency (%) 
CA879857.8%
 
NY11127.3%
 
NM6034.0%
 
GA5283.5%
 
TX5133.4%
 
NC4913.2%
 
LA4352.9%
 
NJ3382.2%
 
FL2311.5%
 
IL2161.4%
 
Other values (39)194612.8%
 

Length

Max length2
Median length2
Mean length2
Min length2

depart_code
Categorical

Distinct count24
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
unk
8001
21.0
 
1151
17.0
 
960
8.0
 
845
3.0
 
685
Other values (19)
3569
ValueCountFrequency (%) 
unk800152.6%
 
21.011517.6%
 
17.09606.3%
 
8.08455.6%
 
3.06854.5%
 
6.05733.8%
 
2.05303.5%
 
14.04653.1%
 
18.04052.7%
 
11.03372.2%
 
Other values (14)12598.3%
 

Length

Max length4
Median length3
Mean length3.271908487
Min length3
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
D
14995
F
 
216
ValueCountFrequency (%) 
D1499598.6%
 
F2161.4%
 

Length

Max length1
Median length1
Mean length1
Min length1
Distinct count12
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
7
12468
1
 
2353
8
 
184
2
 
93
C
 
65
Other values (7)
 
48
ValueCountFrequency (%) 
71246882.0%
 
1235315.5%
 
81841.2%
 
2930.6%
 
C650.4%
 
4180.1%
 
A130.1%
 
580.1%
 
63< 0.1%
 
92< 0.1%
 
Other values (2)4< 0.1%
 

Length

Max length1
Median length1
Mean length1
Min length1

handling_office
Categorical

Distinct count28
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
LOS ANGELE
7025
SACRAMENTO
1884
DALLAS WC
 
1341
NEW JERSEY
 
854
LONG ISLAN
 
531
Other values (23)
3576
ValueCountFrequency (%) 
LOS ANGELE702546.2%
 
SACRAMENTO188412.4%
 
DALLAS WC13418.8%
 
NEW JERSEY8545.6%
 
LONG ISLAN5313.5%
 
WC SOUTHEA5173.4%
 
CHARLOTTE5093.3%
 
IN-STATE A3502.3%
 
ATLANTA2941.9%
 
ILLINOIS2721.8%
 
Other values (18)163410.7%
 

Length

Max length10
Median length10
Mean length9.700874367
Min length6

injury_city
Categorical

HIGH CARDINALITY

Distinct count1330
Unique (%)8.7%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
LOS ANGELES
2257
UNKNOWN
 
1526
BURBANK
 
1478
NEW ORLEANS
 
605
NEW YORK
 
389
Other values (1325)
8956
ValueCountFrequency (%) 
LOS ANGELES225714.8%
 
UNKNOWN152610.0%
 
BURBANK14789.7%
 
NEW ORLEANS6054.0%
 
NEW YORK3892.6%
 
BROOKLYN3852.5%
 
WILMINGTON3192.1%
 
CULVER CITY2911.9%
 
AUSTIN2601.7%
 
ATLANTA2211.5%
 
Other values (1320)748049.2%
 

Length

Max length19
Median length8
Mean length9.116889093
Min length1

injury_postal
Categorical

HIGH CARDINALITY

Distinct count1824
Unique (%)12.0%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
91502
4991
95816
 
392
90038
 
298
90028
 
216
90001
 
216
Other values (1819)
9098
ValueCountFrequency (%) 
91502499132.8%
 
958163922.6%
 
900382982.0%
 
900282161.4%
 
900012161.4%
 
915051941.3%
 
902321681.1%
 
915041651.1%
 
916081561.0%
 
915211250.8%
 
Other values (1814)829054.5%
 

Length

Max length9
Median length5
Mean length4.992308198
Min length3

injury_state_code
Categorical

HIGH CARDINALITY

Distinct count53
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
CA
7914
NY
 
1346
LA
 
1127
GA
 
651
NC
 
573
Other values (48)
3600
ValueCountFrequency (%) 
CA791452.0%
 
NY13468.8%
 
LA11277.4%
 
GA6514.3%
 
NC5733.8%
 
TX3992.6%
 
NM3812.5%
 
PA3042.0%
 
MI2711.8%
 
UT2621.7%
 
Other values (43)198313.0%
 

Length

Max length2
Median length2
Mean length2
Min length2
Distinct count47
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
CA
8970
NY
 
1514
LA
 
847
GA
 
562
NC
 
526
Other values (42)
2792
ValueCountFrequency (%) 
CA897059.0%
 
NY151410.0%
 
LA8475.6%
 
GA5623.7%
 
NC5263.5%
 
TX3502.3%
 
NM3292.2%
 
PA2361.6%
 
UT2361.6%
 
IL2331.5%
 
Other values (37)14089.3%
 

Length

Max length2
Median length2
Mean length2
Min length2
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
MO
11648
LT
3563
ValueCountFrequency (%) 
MO1164876.6%
 
LT356323.4%
 

Length

Max length2
Median length2
Mean length2
Min length2

nature_injury_code
Real number (ℝ≥0)

Distinct count45
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.6458483991848
Minimum1
Maximum91
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum1
5-th percentile10
Q136
median43
Q352
95-th percentile59
Maximum91
Range90
Interquartile range (IQR)16

Descriptive statistics

Standard deviation17.15199328
Coefficient of variation (CV)0.4118536166
Kurtosis0.0766656238
Mean41.6458484
Median Absolute Deviation (MAD)9
Skewness-0.4636156896
Sum633475
Variance294.1908735
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
52351523.1%
 
40241315.9%
 
59220314.5%
 
10159710.5%
 
4910136.7%
 
437845.2%
 
257595.0%
 
376644.4%
 
285533.6%
 
362001.3%
 
Other values (35)15109.9%
 
ValueCountFrequency (%) 
1720.5%
 
280.1%
 
34< 0.1%
 
41771.2%
 
7690.5%
 
ValueCountFrequency (%) 
91600.4%
 
901150.8%
 
80550.4%
 
78140.1%
 
77140.1%
 

#dependents
Categorical

Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
unk
14657
1.0
 
256
2.0
 
173
3.0
 
80
4.0
 
28
Other values (5)
 
17
ValueCountFrequency (%) 
unk1465796.4%
 
1.02561.7%
 
2.01731.1%
 
3.0800.5%
 
4.0280.2%
 
5.080.1%
 
9.03< 0.1%
 
6.03< 0.1%
 
7.02< 0.1%
 
18.01< 0.1%
 

Length

Max length4
Median length3
Mean length3.000065742
Min length3

osha_injury_type_code
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0329366905528894
Minimum1.0
Maximum6.0
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum6
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2853534802
Coefficient of variation (CV)0.276254569
Kurtosis140.2068312
Mean1.032936691
Median Absolute Deviation (MAD)0
Skewness11.05400215
Sum15712
Variance0.08142660868
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11493498.2%
 
21561.0%
 
3600.4%
 
4270.2%
 
5260.2%
 
680.1%
 
ValueCountFrequency (%) 
11493498.2%
 
21561.0%
 
3600.4%
 
4270.2%
 
5260.2%
 
ValueCountFrequency (%) 
680.1%
 
5260.2%
 
4270.2%
 
3600.4%
 
21561.0%
 

severity_index_code
Real number (ℝ≥0)

SKEWED

Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.985273815002301
Minimum1.0
Maximum15.0
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum1
5-th percentile10
Q110
median10
Q310
95-th percentile10
Maximum15
Range14
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3666488551
Coefficient of variation (CV)0.03671895853
Kurtosis505.9408314
Mean9.985273815
Median Absolute Deviation (MAD)0
Skewness-20.44536795
Sum151886
Variance0.1344313829
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
101515899.7%
 
1190.1%
 
9110.1%
 
76< 0.1%
 
145< 0.1%
 
64< 0.1%
 
152< 0.1%
 
42< 0.1%
 
22< 0.1%
 
52< 0.1%
 
ValueCountFrequency (%) 
1190.1%
 
22< 0.1%
 
42< 0.1%
 
52< 0.1%
 
64< 0.1%
 
ValueCountFrequency (%) 
152< 0.1%
 
145< 0.1%
 
101515899.7%
 
9110.1%
 
76< 0.1%
 

type_loss_code
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
3
15059
1
 
152
ValueCountFrequency (%) 
31505999.0%
 
11521.0%
 

Length

Max length3
Median length3
Mean length3
Min length3

reforms_dummy
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
unk
6241
California Refom 1
4922
California Refom 0
2962
California Reform 2
 
1086
ValueCountFrequency (%) 
unk624141.0%
 
California Refom 1492232.4%
 
California Refom 0296219.5%
 
California Reform 210867.1%
 

Length

Max length19
Median length18
Mean length11.91696798
Min length3

length_employed
Real number (ℝ≥0)

Distinct count46
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.629478666754323
Minimum1.0
Maximum60.0
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q311
95-th percentile15
Maximum60
Range59
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.545754981
Coefficient of variation (CV)0.5958146263
Kurtosis7.719798073
Mean7.629478667
Median Absolute Deviation (MAD)3
Skewness1.228233581
Sum116052
Variance20.66388835
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7180611.9%
 
413428.8%
 
212348.1%
 
311217.4%
 
1110697.0%
 
510506.9%
 
1010336.8%
 
139005.9%
 
128875.8%
 
68725.7%
 
Other values (36)389725.6%
 
ValueCountFrequency (%) 
18685.7%
 
212348.1%
 
311217.4%
 
413428.8%
 
510506.9%
 
ValueCountFrequency (%) 
601< 0.1%
 
551< 0.1%
 
541< 0.1%
 
521< 0.1%
 
513< 0.1%
 

diff_carrier_employer
Real number (ℝ)

ZEROS

Distinct count246
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.903490894747223
Minimum-1094.0
Maximum1994.0
Zeros3172
Zeros (%)20.9%
Memory size118.8 KiB

Quantile statistics

Minimum-1094
5-th percentile0
Q11
median2
Q35
95-th percentile27
Maximum1994
Range3088
Interquartile range (IQR)4

Descriptive statistics

Standard deviation44.82755059
Coefficient of variation (CV)5.671867177
Kurtosis656.4433692
Mean7.903490895
Median Absolute Deviation (MAD)2
Skewness18.6287836
Sum120220
Variance2009.509292
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1374624.6%
 
0317220.9%
 
2170411.2%
 
313679.0%
 
410356.8%
 
57034.6%
 
65713.8%
 
74493.0%
 
82821.9%
 
91641.1%
 
Other values (236)201813.3%
 
ValueCountFrequency (%) 
-10941< 0.1%
 
-6931< 0.1%
 
-3631< 0.1%
 
-3621< 0.1%
 
-3461< 0.1%
 
ValueCountFrequency (%) 
19941< 0.1%
 
17951< 0.1%
 
12451< 0.1%
 
12131< 0.1%
 
11412< 0.1%
 

diff_employer_injury
Real number (ℝ)

ZEROS

Distinct count411
Unique (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.78745644599303
Minimum-31.0
Maximum4200.0
Zeros11197
Zeros (%)73.6%
Memory size118.8 KiB

Quantile statistics

Minimum-31
5-th percentile0
Q10
median0
Q31
95-th percentile19
Maximum4200
Range4231
Interquartile range (IQR)1

Descriptive statistics

Standard deviation163.4682898
Coefficient of variation (CV)8.261207815
Kurtosis208.3063019
Mean19.78745645
Median Absolute Deviation (MAD)0
Skewness13.04614826
Sum300987
Variance26721.88178
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01119773.6%
 
113719.0%
 
24352.9%
 
33502.3%
 
42431.6%
 
51611.1%
 
71260.8%
 
61180.8%
 
8630.4%
 
9600.4%
 
Other values (401)10877.1%
 
ValueCountFrequency (%) 
-311< 0.1%
 
-101< 0.1%
 
-51< 0.1%
 
-31< 0.1%
 
-21< 0.1%
 
ValueCountFrequency (%) 
42001< 0.1%
 
38891< 0.1%
 
37651< 0.1%
 
35241< 0.1%
 
33341< 0.1%
 

length_how_injury
Real number (ℝ≥0)

Distinct count51
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.90283347577411
Minimum7
Maximum60
Zeros0
Zeros (%)0.0%
Memory size118.8 KiB

Quantile statistics

Minimum7
5-th percentile45
Q159
median60
Q360
95-th percentile60
Maximum60
Range53
Interquartile range (IQR)1

Descriptive statistics

Standard deviation5.714730541
Coefficient of variation (CV)0.09869517946
Kurtosis17.47744287
Mean57.90283348
Median Absolute Deviation (MAD)0
Skewness-3.879369303
Sum880760
Variance32.65814516
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
601023067.3%
 
59249516.4%
 
582321.5%
 
572191.4%
 
561651.1%
 
551621.1%
 
541370.9%
 
531350.9%
 
511290.8%
 
491110.7%
 
Other values (41)11967.9%
 
ValueCountFrequency (%) 
77< 0.1%
 
81< 0.1%
 
101< 0.1%
 
111< 0.1%
 
122< 0.1%
 
ValueCountFrequency (%) 
601023067.3%
 
59249516.4%
 
582321.5%
 
572191.4%
 
561651.1%
 

shift
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size118.8 KiB
2nd
6998
1st
6003
3rd
2210
ValueCountFrequency (%) 
2nd699846.0%
 
1st600339.5%
 
3rd221014.5%
 

Length

Max length3
Median length3
Mean length3
Min length3

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Dependentave_wkly_wagebody_part_codecause_codeclaimant_agegender_codemarital_status_codeclaim_st_codedepart_codedomestic_foreign_codeemploy_status_codehandling_officeinjury_cityinjury_postalinjury_state_codejurisdiction_codelost_time_or_medicalonly_codenature_injury_code#dependentsosha_injury_type_codeseverity_index_codetype_loss_codereforms_dummylength_employeddiff_carrier_employerdiff_employer_injurylength_how_injuryshift
098679500.046170021.0FunkCAunkD1LOS ANGELEPORTLAND97201ORCALT28unk1.09.03.0California Refom 014.04.00.0572nd
1557271037.042150040.0MunkCAunkD1IN-STATE APEARL HARBOR91502HIHILT521.01.010.03.0unk14.01.02.0511st
2986151226.090150049.0MunkIDunkD7LOS ANGELEBURBANK91502CACALT59unk1.010.03.0California Refom 07.08.0632.0601st
3513961000.018190051.0MunkIDunkD7LOS ANGELEBURBANK91502CACALT59unk1.010.03.0California Refom 07.01.0200.0601st
440791000.030150055.0MunkCAunkD7LOS ANGELEBURBANK91502CACALT78unk1.010.01.0California Refom 07.00.0444.0601st
519091129.042150049.0MunkCAunkD7LOS ANGELEUNKNOWN91502CACALT59unk1.010.03.0California Refom 015.03.01254.0601st
666871000.042150036.0MunkCAunkD7LOS ANGELEMISSION HILLS91345CACALT52unk1.010.03.0California Refom 07.010.01223.0601st
753521000.042150045.0MunkCAunkD7LOS ANGELEBURBANK91504CACALT52unk1.010.03.0California Refom 07.06.0436.0601st
863241000.054150045.0MunkCAunkD1LOS ANGELELAS VEGAS89104NVCALT52unk1.010.03.0California Refom 013.02.0642.0601st
9228280.048190058.0MunkCAunkD7LOS ANGELEUNKNOWN91502CACALT61unk1.010.01.0California Refom 07.0-25.0639.0601st

Last rows

Dependentave_wkly_wagebody_part_codecause_codeclaimant_agegender_codemarital_status_codeclaim_st_codedepart_codedomestic_foreign_codeemploy_status_codehandling_officeinjury_cityinjury_postalinjury_state_codejurisdiction_codelost_time_or_medicalonly_codenature_injury_code#dependentsosha_injury_type_codeseverity_index_codetype_loss_codereforms_dummylength_employeddiff_carrier_employerdiff_employer_injurylength_how_injuryshift
152012641000.061150038.0MunkVA18.0D7WC SOUTHEARICHMOND23222VAVAMO52unk1.010.03.0unk1.00.00.0601st
1520210341000.012170048.0MunkFL20.0D7WC SOUTHEASTONE MOUNTAIN30087GAFLMO7unk1.010.03.0unk1.02.00.0603rd
152039261000.054120060.0MunkAZ19.0D7WC SOUTHEAFAYETTEVILLE30214GAGAMO36unk1.010.03.0unk1.013.05.0602nd
152047801000.038150039.0FunkGA8.0D7WC SOUTHEASENOIA30276GAGAMO52unk1.010.03.0unk1.02.00.0593rd
1520501000.035120044.0MunkNC8.0D7WC SOUTHEAWILMINGTON28401NCNCMO40unk1.010.03.0unk1.00.00.0601st
1520624051000.053130021.0FunkGA6.0D7WC SOUTHEAFAYETTEVILLE30214GAGAMO37unk1.010.03.0unk1.00.00.0602nd
1520718076486.038150033.0MunkGA20.0D1WC SOUTHEAPEACHTREE CITY91502GAGALT52unk1.010.03.0unk1.00.00.0602nd
1520801000.055150033.0MUVA22.0D7WC SOUTHEAHENRICO23238VAVAMO49unk1.010.03.0unk1.01.00.0603rd
152095071000.033150034.0MunkGA3.0D7WC SOUTHEAHIRAM30141GAGAMO52unk1.010.03.0unk1.01.00.0602nd
1521001000.042150049.0FunkMO11.0D7WC SOUTHEANASHVILLE37214TNSCMO52unk1.010.03.0unk1.00.03.0603rd

Duplicate rows

Most frequent

Dependentave_wkly_wagebody_part_codecause_codeclaimant_agegender_codemarital_status_codeclaim_st_codedepart_codedomestic_foreign_codeemploy_status_codehandling_officeinjury_cityinjury_postalinjury_state_codejurisdiction_codelost_time_or_medicalonly_codenature_injury_code#dependentsosha_injury_type_codeseverity_index_codetype_loss_codereforms_dummylength_employeddiff_carrier_employerdiff_employer_injurylength_how_injuryshiftcount
090307100.090150049.0MMCAunkD1LOS ANGELEBURBANK91502CACALT801.01.010.01.0California Refom 012.02.0145.0591st2
11260871000.090190039.0MUCAunkD7LOS ANGELEBURBANK91502CACALT591.01.010.03.0California Refom 112.00.0600.0601st2